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IMAGE  REPRESENTATION  USING  FAST  ALGORITHMS 
BASED  ON  THE  ZAK  TRANSFORM 

L  GENERAL  INTRODUCTION 

Visual  images  can,  in  many  contexts,  be  most  efficiently  represented  by  decotr5)osing  them 
into  spectral  functions  that  can  then  be  added  together  to  reconstruct  the  image  (cf. ,  Geii,  Zeevi 
&  Porat,  1990).  Further,  the  human  visual  system  has  certain  properties  which  suggest  the  most 
appropriate  characteristics  for  the  spectral  functions  used  to  represent  visual  imagery.  As  noted 
above,  the  human  visual  system  is  spatially  inhomogeneous,  and  so  the  most  appropriate  basis 
functions  are  those  that  are  localized  (i.e.,  that  have  finite  extent).  Also,  the  visual  mechanisms 
presumed  to  xmderhe  form  discrimination  are  symmetrical,  and  so  it  may  be  usefiil  for  the  basis 
functions,  used  to  represent  visual  imageiy,  to  be  symmetrical.  Finally,  it  is  most  appropriate  to 
represent  images  usmg  orthogonal  basis  functions  which  allow  representation  with  minimal 
redundancy  and  with  the  fewest  number  of  coefficients. 

The  foregoing  suggests  that  the  ideal  set  of  basis  functions,  for  representing  visual  images, 
should  be  localized,  symmetrical,  and  orthogonal.  The  problem  is  that  a  basis  with  finite  support 
(i.e.,  one  that 'is  spatially  localized)  cannot  be  both  orthogonal  and  symmetrical.  Therefore,  as  a 
practical  matter,  the  chosen  basis  must  represent  a  compromise  among  the  properties  of 
localization,  symmetry,  and  orthogonahty.  One  example  of  a  popular  basis  is  the  Gabor  functions 
(cf  Porat  &  Zeevi,  1988),  which,  although  not  orthogonal,  provide  good  combined  localization  in 
position  and  spatial-frequency.  Another  well-known  basis  is  the  wavelets  devised  by  Daubechies 
(1988),  which  are  localized  and  orthogonal  but  are  not  symmetrical.  In  the  present  experiment, 
we  describe  three  bases  derived  from  window  functions  that  have  various  desirable  properties  in 
the  context  of  image  representation.  We  begin  in  Part  n  by  attercpting  an  intuitive  description  of 
the  Zak  transform  (ZT)  based  on  an  analogy  with  the  more  famihar  Fourier  transform 

In  Part  HI,  we  present  a  mathematical  technique  for  analyzing  images  based  on 
two-dimensional  Hermite  fimctions  that  are  translated  in  both  space  and  spatial  fi-equency. 
Although  the  translated  functions  are  not  orthogonal,  they  do  constitute  a  frame  and  hence  can  be 
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used  for  image  expansion.  The  technique  has  the  practical  advantage  that  fast  algorithms  based 
on  the  ZT  can  be  used  to  compute  expansion  coefficients.  We  describe  properties  of  the  ZT  that 
are  relevant  to  image  representation,  and  which  allow  us  to  use  the  ZT  to  both  efficiently  conpute 
expansion  coefficients  and  to  reconstruct  images  from  them  Finally,  we  use  a  Hermite  function 
frame  to  decompose  and  reconstruct  a  texture  image. 

In  Part  IV,  an  image  representation  technique  is  described,  which  uses  a  window  function 
that  provides  good  localization  in  both  pace  and  patial  frequency.  The  window  function  is 
obtained  by  weighting  the  ZT  of  a  gaussian.  The  weighting  procedure  eliminates  the  zero  in  the 
ZT  thus  allowing  efficient  and  stable  computation  of  expansion  coefficients  with  repect  to  the 
derived  window  function.  Since  the  window  function  is  related  to  Gabor  fimctions  and,  in 
addition,  resembles  a  visual  recptive  field,  it  may  also  be  useful  in  visual  representation  and 
modeling. 

Finally,  in  Part  V,  we  describe  an  application  of  the  discrete  cosiue  transform  (DCT)  to  the 
representation  of  images  iu  the  combiued  position/patial-frequency  domain.  The  major 
computational  tool  for  calculating  the  coefficients  is  what  we  call  the  discrete  Zak-Cosine 
transform  (DZCT).  The  technique  is  mathematically  complete  in  that  the  original  image  can  be 
reconstructed  exactly. 


n.  INTRODUCTION  TO  THE  ZAK  TOANSFORM 


Fourier  transform  techniques  are  inherently  limited  in  that  their  associated  basis  functions 
are  theoretically  infinite  in  extent  and  hence  cannot  adequately  represent  the  localized  features 
typical  of  most  real-world  images.  This  limitation  can  be  addressed  by  combined 
position/patial-frequency  representations  based  on  the  Wigner  distribution,  ambigmty  functions, 
Gabor  functions,  etc.  (cf,  Jacobson  &  Wechsler,  1988).  Although  the  Zak  Transform  (ZT)  was 
originally  developed  in  the  context  of  quantum  mechanics  (Zak,  1967),  it  has  been  used 
extensively  in  other  areas  of  both  pime  (Janssen,  1982)  and  applied  (Auslander,  Gertner  & 
Tolimieri,  1991;  Bergmans  &  Janssen,  1987)  research.  Further,  as  noted  by  Zeevi  and  Gertner 
(1992),  the  ZT  can  be  used  in  image  representation  to  map  a  two-dimensional  signal  (such  as  an 
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image)  into  a  four-dimensional  space  which  can  be  interpreted  as  consisting  of  two  spatial 
dimensions  and  two  spatial-frequency  dimensions.  It  will  now  be  shown  that  the  ZT  is,  in  a  sense, 
a  partial  or  intermediate  result  of  the  Fast  Fourier  Transform  (FFT),  and  that  it  provides  an 
effective  joint  position/spatial-frequency  representation.  The  following  discussion  of  the  Zak 
transform  considers  one-dimensional  signals  only  and  is  meant  to  be  elementary  an  intuitive.  A 
more  detailed  description  of  the  ZT,  its  properties,  and  its  application  to  two-dimensional  signals 
(images)  will  be  presented  in  Part  m. 

Consider  first  an  eight-element,  one-dimensional  vector  of  data  as  shown  at  the  top  of 
Figure  1.  An  FFT  can  be  performed  on  this  vector  as  follows:  1)  Decorrq)ose  the  vector  into 
smaller  vectors  and  rearrange  them  as  shown  hi  the  second  and  third  panels  of  Figure  1.  This 
maps  the  eight-element  vector  into  a  two-dimensional  4x2-element  array.  2)  Compute  an  FFT,  of 
length  two,  on  each  of  the  four  columns.  3)  Multiply  by  an  appropriate  phase  factor.  4)  Conpute 
an  FFT,  of  length  four,  on  each  of  the  two  rows.  5)  Remdex  back  to  a  one-dimensional  array  that 
now  represents  frequency. 

Now  consider  the  discrete  ZT  as  apphed  to  a  16-element  vector.  The  individual  data 
points,  which  are  at  a  relatively  fine  scale,  are  first  partitioned  into  a  coarser  scale,  as  shown  at  the 
top  of  Figure  2.  The  samples  corresponding  to  each  index  on  the  coarser  scale  are  then  mapped 
into  rows  forming  a  4x4  array  as  shown  in  the  second  panel  of  Figure  2.  Each  cell  in  the  resultant 
array  is  thus  represented  by  two  numbers— one  correspondmg  to  the  coarse  scale  and  one 
corresponding  to  the  fine  scale.  Next,  an  FFT  is  performed  on  the  first  row  of  the  array  that 
represents  the  set  of  first  samples,  on  the  fine  scale,  within  each  coarse  block.  The  results  (i.e., 
the  Fourier  coefficients)  are  placed  in  the  corresponding  row  of  a  second  array,  which  therefore 
contains  the  FT  of  the  set  of  first-fine  sartples  within  each  coarse  block.  This  process  is 
continued  for  each  row  in  the  original  array,  so  that  each  row  in  the  second  array  contains  the 
coefficients  associated  with  successive  sets  of  fine  sanples.  As  a  result  of  this  procedure,  we  now 
have  a  two-dimensional  representation  of  the  original  one-dimensional  signal.  We  have  lost  some 
resolution  in  one  dimension  (the  original  data  dimension,  e.g.,  time  or  position),  but  we  have 
added  resolution  in  another  (the  transform  or  frequency)  dimension. 
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Steps  in  Performing  a  1-D  Fast  Fourier  Transform 
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By  analogy  with  Figure  1,  Figure  2c  shows  that  the  ZT  is  an  intermediate  step  in  an 
ordinary  FFT.  This  intermediate  procedure  has  several  interesting  and  useful  properties.  First, 
the  ZT  is  quasi-periodic  (i.e.,  its  absolute  value  is  periodic  in  both  of  the  dimensions  described 
above),  so  that  the  ZT  at  a  higher  frequency  can  be  obtained  by  smq)ly  extending  the  ZT  at  a 
lower  frequency.  Second,  the  ZT  of  a  shifted  (in  either  or  both  dimensions)  gaussian  is  equal  to 
the  ZT  of  an  undiifted  gaussian  times  a  phase  factor.  It  is  also  a  well-known  FT  property  that  a 
shift  in  time,  for  instance,  is  equivalent  to  multiplying  by  a  phase  factor  in  frequency.  However, 
this  property  is  even  more  powerful  in  the  context  of  the  ZT  since  it  applies  to  both  (all) 
dimensions.  As  a  result,  and  as  is  more  fully  described  hi  the  Introduction  to  Part  IV,  rather  than 
taking  the  ZT  of  gaussians  at  all  positions  and  frequencies,  it  is  only  necessary  to  calculate  the  ZT 
of  a  single  gaussian  and  multiply  it  by  the  appropriate  (complex-exponential)  phase  factors. 

The  ZT  properties  just  described,  as  well  as  others,  will  be  discussed  in  more  detail  in  the 
next  section,  where  a  technique  is  described  for  calculating  Gabor-like  expansion  coefficients 
using  the  ZT  of  window  functions  related  to  gaussian  derivatives. 

m.  IMAGE  REPRESENTATION  USING  HERMITE  FUNCTIONS 
Introduction 

Most  features  of  interest  in  natural  and  man-made  images  are  spatially  localized.  It  is, 
therefore,  not  surprisitig  that  the  mammalian  visual  system  has  evolved  properties  to  deal  with 
localized  features.  One  such  property  is  the  visual  receptive  field  (VRF),  the  most  sahent 
characteristics  of  which  are  its  limited  spatial  extent  and  the  form  of  its  sensitivity  profile.  Early 
models  of  the  visual  system  suggested  that  the  VRF  was  analogous  to  a  Fourier  analyzer 
(Braddick,  Campbell  &  Atkinson,  1978).  This  analogy  was  known  even  then  to  be  inadequate 
since  Fourier  analysis  is  global  in  nature  and  thus  cannot  adequately  model  the  visual  response  to 
localized  spatial  features.  One  of  the  more  popular  models  suggested  for  addressing  this 
inadequacy  involved  the  introduction  of  Gabor  functions  (Daugman,  1980;  Marcelja,  1980)  to 
replace  the  sinusoids  used  in  Fourier  analysis.  Gabor  functions  (i.e.,  gaussian-weighted  sinusoids) 
are  spatially  localized,  and  were  initially  popular  because  they  were  known  to  provide  the  highest 
conjoint  resolution  in  the  spatial  and  ^atial-frequency  domains.  Two-dimensional  Gabor 
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functions  have  also  been  found  to  resemble  the  VRF  of  many  cortical  cells  (Daugman,  1988; 
Jones  &  Palmer,  1987). 

Many  functions,  in  addition  to  Gabor  functions,  have  been  proposed  to  describe  VRF 
profiles.  These  include  differences  of  gaussians  (Kulikowski,  Marcelja  &  Bishop,  1982;  Rodieck 
&  Stone,  1965),  Laplacians  of  gaussians  (Marr  &  Hildreth,  1980),  and  gaussian  derivatives  (Stork 
&  Wilson,  1990;  Young,  1987).  It  is  no  coincidence  that  all  of  these  fimctions  involve  gaussians 
in  some  form  As  noted  by  Marr  (1982)  and  by  Koenderink  (1984),  efficient  feature  extraction 
requires  that  images  be  analyzed  at  multiple  spatial  scales.  Thus,  analysis  at  a  particular  scale 
involves  the  use  of  blurring  to  remove  details  at  higher  levels.  The  gaussian  is  uniquely  suited  to 
this  role  in  that  it  is  smooth  and  localized,  and  hence  least  likely  to  introduce  either  spatial  or 
spectral  artifacts. 

Koenderink  and  van  Doom  (1990)  have  suggested  a  taxonomy  of  mathematically 
allowable  VRF  profiles  based  on  sets  of  functions  consistent  with  known  properties  of  the  visual 
system  such  as  size  invariance,  absence  of  spurious  resolution,  and  effectively  continuous  spatial 
sampling.  They  note  that  the  functions  which  describe  the  VRFs  that  possess  these  properties  are 
solutions  of  a  particular  differential  equation,  and  that  one  family  of  solutions  is  the  weighted 
Hermite  polynomials  (Yang,  1992).  Further,  Zucker  and  Hummel  (1986)  have  suggested  that  the 
spatial  inteipolation  functions  required,  for  instance,  to  explain  visual  hyperacuity  should  have 
spatial  support  that  is  less  than  that  provided  by  more  traditional  sinc-fimctions.  They 
recommended  the  use  of  Hermite  polynomials,  which  are  more  local  than  sinc-fimctions,  and 
which  are  related  to  gaussian  derivatives  and  hence  are  appropriate  for  characterizing  gaussian 
VRFs.  Finally,  the  use  of  Hermite  polynomials  is  also  supported  by  the  curve-Jfitting  analysis 
performed  by  Young  (1987),  who  tested  several  candidate  functions  for  representing  the  VRFs 
derived  from  the  visual  responses  of  various  mammahan  species.  Although  many  of  the  fimctions 
gave  acceptable  fits  to  certain  of  the  data,  he  concluded  that  the  gaussian  derivative  (also  a  type  of 
weighted  Hermite  polynomial)  provided  the  best  overall  fit. 

In  the  present  paper  we  describe  a  technique  for  decortq)osing  images  in  a  combined 
position/spatial-fi'equency  domain,  using  a  fi-ame  derived  fi’om  Hermite  fimctions  (i.e.,  Hermite 
polynomials  multiphed  by  gaussians).  A  fi^ame  is  a  generalization  of  a  basis,  which  allows 
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expansion  using  a  broader  class  of  functions  (cf.,  Daubechies,  1992).  Unlike  Martens  (1990), 
who  decomposed  images  with  respect  to  an  orthonormal  family  of  Hermite  functions  (i.e.,  a 
basis),  we  generate  a  Gabor-like  frame  using  any  one  of  the  Hermite  functions  as  a  window.  This 
frame  is  generated  by  shifting  the  chosen  Hermite  fimction  to  all  possible  discrete  image  positions. 
At  each  position  a  Gabor-like  expansion  is  performed  using  spatial  frequencies  matched  to  the 
image.  We  also  describe  a  fast  algorithm,  based  on  the  Zak  transform,  for  conqiuting  expansion 
coefficients  relative  to  the  proposed  frame. 


Image  Representation  in  the  Position/Spatial-Frequency  Domain 

The  hiimati  visual  system  is  spatially  inhomogeneous  and  thus  visual  information  is  most 
efficiently  represented  in  both  position  and  spatial  frequency.  Many  methods  have  been  proposed 
for  representing  images  in  a  joint  position/spatial-frequency  domain  (Jacobson  &  Wechsler, 
1988).  All  of  these  methods  are  based  on  approximations  to  the  Wigner  distribution,  wherein 
finite  images  are  assumed  to  be  two-dimensional,  continuous,  integrable  functions,  and  the  jomt 
representation  is  computed  by  using  integration  with  infinite  bounds.  Since  integrals  with  infinite 
bounds  are  not  always  computable,  various  approximations  have  been  introduced  (Jacobson  & 
Wechsler,  1988).  In  order  to  avoid  these  approximations,  we  propose  a  decomposition  into  a 
finite,  positiori/2-D  spatial  frequency  domain  as  follows: 

We  represent  the  image,  I(n^,  nj,  in  a  realistic  (i.e.,  finite)  form  as  an  array  of  points 
where  n  and  n  are  the  row  and  column  indices,  respectively,  of  the  image  pixels.  Let  h(n^,n)  be 
a  basic  generating  function  (i.e.,  any  one  of  the  functions  shown  in  Figure  3).  We  then  generate  a 

■  e 

frame  (Daubechies,  1990;  1992)  by  the  following  procedure; 

1.  Let  the  image  size  be  NxN  pixels.  Assume  that  iV  is  a  con:Q)osite  number  (for  aU 
practical  apphcations  iV  is  a  power  of  two)  such  that  N=MxL  .  This  is  equivalent  to  saying  that 
the  image  is  represented  hyLxL  blocks  each  of  size  Mx  M . 

2.  Generate  a  finite  set  of  functions,  h^/n^n^,  by  first  centering  h(n^n^  at  the  image 
points  {n-kM,  n-lM),  where  k,l  =  0,1,...,L-L  These  are  the  positions  where  the  expansion 
coefficients  will  be  computed. 
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Figure  3 

Hermite  Functions  of  Order  Zero  Through  Five 
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3.  At  each  position,  generate  a  set  of  fimctions,  at  various  2-D  q)atial 

frequencies,  as  follows: 


=  hkiiux,  riy)  exp 

where  m,n=0, This  corresponds  to  M,  2-D  spatial  frequencies  over  the  range  0  to  27i^ . 
The  set  of  functions  given  in  Equation  1  constitutes  a  frame  according  to  Theorem  1  from 
Janssen  (in  press),  and  thus  the  following  decoDOposhion  is  legitimate. 

4.  Compute  the  coefficients  by  decomposing  the  image  relative  to  the  set  of 
functions  as  an  inner  product  as  follows: 


^^.(nxm  nyn\ 


(1) 


such  that 


LM 

kj/n^ 


where 


LM 


k^,m,n 


L-1  L-1  M-\  M-\ 


EZ  E  Z 


A=0  1=0  m=0  n=0 


As  a  result  of  this  decomposition,  we  now  have  a  finite,  discrete,  2-D  spectrum  at  each 
location  {k,!).  The  discrete  2-D  spectrum  can  be  converted  to  frie  more  conventional 
representation  in  spatial  frequency  and  orientation  through  a  cartesian-to-polar  coordinate 
transformation. 

The  direct  conputational  procedure  to  synthesize  the  image  from  the  expansion 
coefficients,  requires  that  a  4-D  summation  be  performed.  This  is  a  con:5)utationa]ly 

intensive  procedme.  In  the  next  section  we  will  describe  our  implementation  of  a  fast  algorithm 
that  has  been  proposed  (cf ,  Zeevi  &  Gertner,  1992)  for  computing  expansion  coefficients  and  for 
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reconstructing  an  image  from  those  coefficients.  The  major  conq)utational  tool  is  the  fioite  Zak 
transform. 


The  Finite  Zak  Transform  of  Images 

The  4-D  summation  in  Equation  3  is  conq)utationally  intensive  because  it  cannot  in 
general  be  facihtated  by  an  FFT-like  algorithm.  It  has  been  shown  that  a  fast  algorithm  can  be 
developed  using  the  Zak  transform  (Zeevi  &  Gertner,  1992).  The  Zak  transform  (ZT)  of  an 
image  represents  an  intermediate  result  of  the  2-D  fast  Fourier  transform,  and  thus  provides  a 
combined  position/spatial-frequency  representation. 

For  an  image  of  size  NxN,  and  N-L-M,  the  finite  Zak  transform  is  computed  as 
follows: 


L-\  L-l 

(ZlI)  (ij ;  p,  a)  =  S  Z  IQ  +MrJ  +Mp)  exp 

r=0  p=0 


Ini 


V  L  L. 


(4) 


where  /  and  j  are  position  variables  and  p  and  a  are  frequency  variables. 

There  are  four  properties  of  the  ZT  vdiich  are  relevant  to  the  development  of  fast 
algorithms  for  image  representation.  First,  it  follows  directly  from  Equation  4  that  the  ZT  is 
periodic  in  each  pair  of  variables,  (i,j)  and  (p,<j).  Moreover,  the  ZT  satisfies  the  additional 
periodicity  relations: 


(Zl)  (i  +  MJ  +M;  p,  a)  =  exp 


X  (Zi/)(/  j;p,cf), 


(5) 


and 


(ZlI)  (iJ  ;p+L,g+L)  =  (ZlI)  (iJ ;  p,  a) .  (6) 

The  significance  of  this  property  is  that  any  computation  that  involves  the  ZT  need  be  performed 
over  the  fundamental  period  only,  since  values  outside  that  period  can  be  determined  by  the 

periodicity  relations.  For  example,  the  original  image  I(n^,  can  be  recovered  on  the  square  of 
size MxM  from  its  ZT  by: 
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(7) 


\ 


I(iJ)=L-^t  E  (Zi7)(/j;p,a). 

p=0  <1=0 


The  remainder  of  the  image  can  then  be  recovered  by  using  the  periodicity  relations  given  in 
Equations  5  and  6. 

A  second  important  property  of  the  ZT  is  that  it  is  energy  preserving.  That  is,  the  energy 
of  the  ZT  of  an  image  is  equal  to  the  energy  of  the  image  itself  (Zak,  1967;  Zeevi  &  Gertner 
1992).  More  generally,  the  inner  product  of  the  ZTs  of  two  images  is  equal  to  the  inner  product 
of  the  images  themselves,  a  property  which  often  results  in  a  conq)utational  saving  when  the  inner 
product  is  used  to  conq)ute  expansion  coeflScients. 

The  third  property  is  that  the  ZT  of  an  image  is  equal  to  the  ZT  of  the  FT  of  the  image 
times  a  phase  factor. 


e?q) 


^  \n  n) 


(ZlI)  (ij ;  -p,  -cr)  =  (ZmI)  (p,  a ;  ij) , 


(8) 


^  \n  n) 


is  the  phase  factor. 


where  /  is  the  x  iV-point  Fourier  transform  of  I,  and  exp 
This  property  is  a  consequence  of  the  fact  that  multiplying  a  corrqilex  function  (such  as  the  ZT)  by 
a  complex  exponential  (the  phase  factor)  is  equivalent  to  a  rotation  in  the  conq)lex  plane.  From 
Equation  8  it  follows  that  after  rotation  we  get  the  ZT  of  the  FT  of  the  image.  This  property 
confirms  that  the  ZT  contains  position  and  spatial  frequency  information  about  the  image. 


The  fourth  property  of  the  ZT  that  is  useful  in  image  representation  is  that  the  ZT  of  a 
function  [such  as  the  basis  function  Mn^nfl  located  at  a  particxilar  point  in  the  position  and 
spatial  frequency  domains,  is  equal  to  the  ZT  of  the  same  function  located  at  the  origin  in  both 
domains  times  an  appropriate  phase  factor.  Formally  this  can  be  written  as: 


Zfhkji^m.n  —  Z£,/zo,o,o,o  exp 


(mo+no  ki  +  li\ 
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This  property  will  be  used  below  to  develop  a  fest  algorithm  for  decon^osing  images  using 
Hermite  functions.  We  will  first,  however,  describe  the  specific  Hermite  functions  to  be  used  for 
this  propose. 


Hermite  Functions 

We  propose  to  represent  images  usmg  a  basis  derived  jfrom  the  Hermite  polynomials, 
HJx),  which  can  be  defined  as: 

H„{x)  =  (-1)”  exp(x2)  ^[exp(-x2)]  ^  (10) 

where  n  =  0,1,2,....  An  orthonormal  basis,  which  we  will  refer  to  as  Hermite  fimctions,  HFJx) ,  is 
related  to  H/x)  as  follows  (Lebedev,  1972): 


«  =  0,1,2, 


(11) 


Gabor  (1946)-has  shown  that  HF/x)  provides  a  local  minimum  of  the  joint  rmcertainty  product. 
Ax  Acd,  where  x  and  co  are  position  and  spatial  frequency  variables,  respectively.  For  other 
orders,  HFJx)  can  provide  either  maxima  or  minima  (c.f,  Klein  &  Beutter,  1992).  We  will  use 

HFJx,y)  =  HFJx)  HFJy)  as  a  basis  for  image  deconposition  in  the  position/spatial-fi:equency 
domain.  Shown  in  Figme  3  are  examples  of  the  two-dimensional  window  function,  HFJx,y), 
corresponding  to  n  =  0,1,2, 3,4, 5. 

Fast  Algorithms  for  Image  Expansion 

One  advantage  of  the  Hermite  function  approach  is  that  it  allows  the  fast  con^utational 
procedure  developed  by  Zeevi  and  Gertner  (1992)  to  be  used  for  image  synthesis  and  analysis. 
We  will  now  summarize  the  rationale  of  that  procedure. 

As  a  first  step  toward  developing  a  fast  conputational  procedure  for  calculating  expansion 
coefficients  and  synthesizing  an  image  from  them,  we  will  corcpute  the  ZT  by  representing  both 
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the  image,  /,  and  the  basis  functions,  h,  as  four  dimensional  arrays  whose  indices  extend  over 
dimensions  smaller  than  that  of  the  original  image.  Thus,  for  image  size  N  (where  N=MxL), 
and  r,s  =  and  ij  =  0,1,... M-h  denote =/ (i+rM,j+sM),  and  h^i^^  Ji,r;j,s)  = 

y+i'X/  We  then  represent  the  image  as 


LM 

I(i,fy,s')=  ^  Ckj^rn,n 
kj,ni,n 

The  coefficients,  c,  are  now  the  expansion  coefficients  in  the  position/spatial-frequency  domain 
relative  to  the  base  function  h.  The  second  step  in  developing  the  fast  conq)utational  procedure  is 
based  on  &st  taking  the  ZT  of  both  sides  of  Equation  12,  resulting  m; 


ZLliiJ;p,o) 


LM 

kj/n/i 


(13) 


In  this  form,  computation  of  the  coefficients  would  require  a  4-D  summation.  However,  property 
four  of  the  ZT,  described  above,  allows  us  to  sinqrlify  the  conqrutation  by  substituting  Equation  9 
into  Equation  13,  and  removing  the  ZT  of  h  from  the  summation  since  it  is  no  longer  dependent 
on  the  indices  of  summation.  The  result 


LM 

ZlIQ,]  j  P,  Cr)  =  Zi,ho,0,0,0  ^ 

k^,m,n 


ejqp 


mp+no  ki  +  lj 
-2711-^. - +  -77^ 

L  M 


(14) 


is  the  one  sought  in  that  it  is  in  a  form  to  which  EFT  algorithms  can  be  applied. 

The  interpretation  of  Equation  14  is  that  the  ZT  of  an  image  is  equal  to  the  ZT  of  the  base 
function,  located  at  the  origin,  times  the  Fourier  transform  of  the  expansion  coefficients.  Once  the 
ZT  of  the  base  function,  h,  is  coirputed,  it  can  be  stored  and  used  on  any  image.  It  should  be 
noted  that,  since  the  ratio  of  the  ZT  of  the  image  and  the  ZT  of  the  base  fimction  is  required,  care 
must  be  taken  if  the  ZT  of  h  has  a  zero.  Techniques  for  dealing  with  this  situation  have  been 
developed  (Assaleh,  Zeevi,  &  Gertner,  1991).  In  the  context  of  the  present  technique,  image 
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synthesis  is  performed  by  taking  the  Fourier  transform  of  the  coefldcients,  multiplying  by  the  ZT 
of  the  base  function,  and  then  taking  the  inverse  ZT  of  the  result. 


Image  Expansion  Using  Hermite  Functions 

An  exanq)le  of  a  siirqple  texture  image  deconq)osed  using  Hermite  functions  (HFs),  and  the 
Zak  transform  technique  just  described,  is  shown  in  Figure  4.  The  original  256  x  256  image  is 

shown  on  the  right,  and  the  HFs  used  for  the  decomposition  were  of  order  2,  3,  and  4.  A 
program  for  performing  this  Zak-Hermite  decomposition  is  presented  in  Appendix  1.  The 
reconstructions  shown  in  Figure  4  were  obtained  from  only  3,000  coefficients  selected  from  the 
total  of  65,536.  As  can  be  seen  from  the  figure,  the  even-order  HFs  produced  good 
reconstructions  whereas  the  odd  order  HFs  did  not  adequately  recover  the  original  texture.  This 
result  might  be  related  to  the  observation  of  Klein  and  Beutter  (1992)  that  certain  HFs  minimize 
the  joint  position/spatial-frequency  uncertainty  whereas  others  maximize  it.  It  is  well  known  that 
HFs  are  eigenfunctions  of  the  Fourier  transform,  which  means  that  the  Fourier  transform  of  a  HF 
has  the  same  form  as  the  HF  itself  The  difference  in  the  efficacy  of  the  odd  and  even  HFs  in 
representing  the  image  of  Figure  4  may  also  be  related  to  the  fact  that  the  eigenvalues 
corresponding  to  even-order  HFs  are  real  (either  -1  or  +1),  while  those  corresponding  to 
odd-order  HFs  are  pure  imaginary  (either  -i  or  +i).  HFs  with  imaginary  eigenvalues  may  not  be 
good  basis  functions  for  image  analysis  and  synthesis  since  their  Fourier  transform  is  rotated  in  the 
position  domain  by  90  degrees  (this  rotation  is  equivalent  to  multipKcation  by  +i  or  -i)  relative  to 
the  HF.  As  a  result,  the  expansion  coefficients  are  computed  with  respect  to  functions  that  are 
mismatched  in  the  position  and  spatial  frequency  domains  and  which  therefore  may  introduce 
noise  hi  the  calculation  of  the  coefficients.  - 
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IV.  IMAGE  REPRESENTATION  USING 
A  WEIGHTED  ZAK  TRANSFORM 


Introduction 

Gabor  elementary  functions  (GEFs)  have  been  used  increasingly  in  image  representation 
and  synthesis  (Daugman,  1988;  Ebrahimi  &  Kunt  1991;  Porat  &  Zeevi,  1988).  Since  GEFs  do 
not  form  an  orthogonal  basis,  a  biorthogonal  auxiliary  function  is  required  to  determine  the 
expansion  coefiScients  (Bastiaans,  1981;  Porat  &  Zeevi,  1988).  Several  approaches  have  been 
taken  to  solving  the  problem  of  calculating  the  required  coefficients.  For  instance,  Daugman 
(1988)  and  Ebrahimi  and  Kunt  (1991)  have  suggested  iterative  techniques  for  accon:q)hshing  this. 

Zak  (1967)  showed  that  a  signal  could  be  transformed  such  that  when  the  result  is  rotated 
by  90  degrees,  the  inverse  transformation  produces  the  Fourier  transform  (FT)  of  the  original 
signal  (see  Figure  5).  The  Zak  transform  (ZT)  has  since  been  used  to  efficiently  compute  Gabor 
coefficients  of  images  and  to  reconstmct  images  back  from  those  coefficients  (Zeevi  &  Gertner, 
1992).  Zeevi  and  Gertner  (1992)  took  a  more  direct  approach  to  the  calculation  of  Gabor 
coefficients  by  developing  a  conq)utationally  efficient  (4-D  FFT)  algorithm  based  on  the  Zak 
transform.  However,  it  is  well-known  (Janssen,  1982)  that  if  the  Zak  transform  of  a  finite-energy 
function  is  continuous  then  it  has  a  zero.  Although  Zeevi  and  Gertner's  approach  effectively 
avoided  the  "zero"  of  the  Zak  transfonn,  it  did  not  ehminate  it,  and  so  the  method  remained 
sensitive  to  noise.  The  problem  can  be  illustrated  as  follows. 

The  Gabor  representation  of  a  one-dimensional  signal,  fi^x)  can  be  written  as: 

y(x)  =  E  E  •  gmnix)  ,  (15) 

m  ft 


where  g^(x)  is  a  set  of  Gabor  functions,  and  are  the  associated  Gabor  coefficients.  By  taking 
the  ZT  of  both  sides  of  this  equation  and  using  the  property  (Zak,  1967)  that  the  ZT  of  a  shifted 
(in  both  position  and  spatial  frequency)  gaussian  is  equal  to  the  ZT  of  an  unshifled  gaussian 
multiphed  by  a  phase  factor,  we  get: 
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A)  The  Original  Waveform. 

Magnitude 


B)  The  Fourier  Transform. 
Magnitude 
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l! 


Frequency 
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C)  The  Zak  Transform  of  the  Original  Waveform. 


Frequency 


108 


Time 


D)  The  Zak  Transform  of  the  Fourier  Transform  of 
the  Original  Waveform 


Figure  5 

A  Signal  and  Both  Its  Zak  and  Fourier  Transforms 
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(16) 


iZf)(r,s)  =  (Zgm)(r,s)  ES  Omn  •  phase  factor, 

m  n 


where  ggfx)  is  a  gaussian  function  (i.e.,  a  Gabor  fiinction  of  spatial  frequency  zero).  This 
equation  indicates  that  the  Gabor  coefficients  can  now  be  computed  by  taking  the  inverse  FT  of 
the  ratio  Zf(x)/Zggj(x).  However,  since  g^Jx)  has  a  gaussian  window,  its  Zak  transform  (see 
Figure  6a)  has  a  zero  which  renders  the  computational  ratio  undefined  and  the  computation  of  the 
coefficients  unstable  (i.e.  noisy).  The  instability  can  result  in  an  unacceptable  time-frequency 
representation  of  the  signal. 

Previous  attempts  to  address  the  zero  problem  have  resulted  in  only  approximate 
solutions.  For  instance,  given  a  grid  with  a  point  at  zero,  another  grid  is  chosen  which  is  shifted 
by  1/2  in  order  to  avoid  sampling  at  zero.  The  problem  with  this  approach  is  that  while  taking  a 
large  number  of  sampling  points  is  desirable  (in  order  to  get  as  many  coefficients  as  possible),  it 
also  results  in  sampling  points  close  to  zero,  thus  causing  instability  in  the  form  of  approximation 
errors  near  the  (unsampled)  zero  point. 

We  have  applied  the  mathematical  concepts  described  by  Daubechies  (1990)  and  by 
Auslander,  Gertner,  and  Tolimieri  (1991)  to  develop  a  stable  technique  for  calculating  Gabor-like 
expansion  coefficients.  The  technique  allows  a  signal  to  be  reconstructed  from  two  sets  of 
coefficients  representing  the  even-  and  odd-indices,  respectively,  along  one  coordinate,  and  it 
avoids  the  zero  problem  discussed  above.  Further,  the  window  function  forming  the  basis  used  in 
the  present  decomposition  procedure  resembles  a  visual  receptive  field.  Therefore,  the  expansion 
coefficients  obtained  with  this  technique  may  be  relatable  to  cortical  processes  and,  given  their 
computational  advantages,  may  represent  an  improvement  over  the  weighting  function  outputs 
previously  used  in  visual  system  models  (Daugman,  1980,  1985;  Fogel  &  Sagi,  1989;  Malik  & 
Perona,  1990;  Stork  &  Wilson,  1990;  Turner,  1986;  Watson,  1983). 
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Approach 

We  first  describe  a  formal  procedure  (Zeevi  &  Gertner,  1992)  for  computing  stable 
expansion  coefficients  using  the  Zak  transform.  Denote  a  one-dimensional  gaussian  window 
function  by; 


g(x)  =  exp(-7cc^)  .  (17) 

The  Zak  transform  (ZT)  of  any  function,  /,  is  a  doubly-periodic  function  in  two  variables  and  is 
defined  as: 

00 

(Zf)(r,s)  =  r^  2^  Qxp{2Tiirl)-fiT{s  +  t))  ,  (18) 

l=-<a 


where  r  and  s  are  spatial  fi-equency  and  position  variables,  respectively  (Zak,  1967).  Since  the  ZT 
is  doubly-periodic  with  a  fundamental  period  corresponding  to  the  unit  square,  r  and  ^  each  take 
on  values  in  the  interval  0  to  1. 

Define  a  set  of  Gabor  functions,  g^(x),  which  correspond  to  shifts  in  position  (nqg)  and 
spatial  frequency  (mpg): 

gmn(x)  =eKp(impox)  •  g(x-nqo)  ,  (19) 

where  qg=T/2,  Pg=2/T,  and  g(x-nq^  are  window  functions  shifted  in  position.  Next,  compute  the 
ZT  of  the  set  of  (shifted)  Gabor  functions  as; 

00 

(Z^m„)(r,j)  =  H  2)  Qxp{2nirl)  - gmn(T(s+t))  .  (20) 

l=-oo 


By  Equation  19: 
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A)  Zak  transform  of  Gaussian. 


B)  Zak  transform  of  shifted  Gaussian. 


C)  The  weighting  function. 


Figure  6 

The  Zak  Transform  of  Two  Gaussians  and  the  Resultant  Weighting  Function 
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gmn(Tis  +  /))  =  exp(2raw(5  +  0)  ■  gijis + / - 1))  , 


(21) 


and  substituting  Equation  21  into  Equation  20  gives; 

GO 

(Zgm„)  (r,  s)  =  T2^  exp(27C7r/)  •  exp(27i/w(5’ + /))  •  g(T(s  +  /  - 1))  .  (22) 

/=— 00 

As  a  first  step  toward  obtaining  an  improved  set  of  basis  fiinctions  that  avoids  the 
zero-problem  described  earlier,  we  take  the  ZT  of  the  set  of  shifted  Gabor  basis  functions  for  two 
image  subgrids.  One  subgrid  corresponds  to  the  even  points: 

(2^/«>,)  0',s)  =  exp(27cwir)  •  exp(27t/m)  •  (Zg)  (r,s)  ,  (23) 

where  (Zg)(r,s)  is  the  ZT  of  the  unshifted,  zero-fi-equency  Gabor  function,  and  the  other  subgrid 
corresponds  to  the  odd  points: 

(^gin>,+i)('',^)  =  exp(27cmir)  exp(27t7m)  (Zg)(r,5-j)  ,  (24) 

where  (Zg)  (r,  -s  -  j)  is  the  ZT  of  the  shifted,  zero-fi-equency  Gabor  function  whose  zero  is  now  on 
the  boundary  of  the  unit  square  (see  Figure  6b). 

It  can  be  shown  (Daubechies,  1990)  that  the  ZT  of  an  image,/,  can  be  written  as: 

(ZJ)  (r,s)  =  (Zg)  (r,5)2  Z  ■  exp(2TOWi/')  •  exp(27t/OT:y) 

m  m 

+  (Zg)(r,s-\)  •  exp(27i/wir)  •  exp(2ra>w5)  ,  (25) 

m  ni 

where  the  odd  and  even  coefficients  can  be  obtained  by  computing  the  inner  product  of  the  image 
with  the  function  gmn(r,s)  whose  ZT  is  defined  as: 
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Zg(r,s) 


(26) 


zg(r,s)  = 


\Zg(r,s)\^  +  \Zg{r,s-\)f 


where  |g|  denotes  the  magnitude  of  the  complex-valued  fimction  g.  Using  the  unitarity  (energy 
preserving)  property  of  the  ZT  (Auslander  etal.^  1991): 


Clmn  —  (J ,  gmn  )  —  {Zf ,  Zgmn  ) 


(27) 


Thus,  the  even  coeflBcients  for  Equation  25  can  be  computed  as; 


am.2rn  =  \  dr  ds  exp(-27C/>iir)  •  exp(-27r//w5)  •  Zfir, s)  ■  Zg(r, s)  ,  (28) 

J  0  J  0 

where  Zg(r,s)  denotes  the  complex  conjugate  of  the  fimction  Zg(r,s).  It  can  be  seen  fi-om 
Equation  28  that  the  even  coefficients  are  the  FT  of  Zf  •  Zg.  The  odd  coefficients  can  be 

computed  similarly  as  the  following  FT; 


am,2ni+\=\^\^drds  exp(-27a«  i  r)  •  exp(-27t//w5)  •  Zf{r,  s)  •  Zg{r,  5  -  j)  .  (29) 


Once  the  even  and  odd  coefficients  are  computed  using  Equations  28  and  29,  the  original  signal 
can  be  reconstructed  using  Equation  25. 


Discussion 

Gabor  functions  are  used  in  image  representation  (Ebrahimi  &  Kunt,  1991;  Manjunath  & 
Chellappa,  1993;  Porat  &  Zeevi,  1988)  and  visual  system  modeling  (Daugman,  1980,  1985;  Fogel 
&  Sagi,  1989;  Malik  &  Perona,  1990;  Stork  &  Wilson,  1990;  Turner,  1986;  Watson,  1983) 
because  they  are  well-localized  in  both  position  and  spatial  frequency,  and  they  resemble  certain 
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visual  receptive  fields.  The  major  disadvantage  of  using  Gabor  functions  is  that  they  are  not 
orthogonal  and  so  do  not  provide  the  most  efficient  representation.  Techniques  have  been 
developed  (Bastiaans,  1981;  Porat  &  Zeevi,  1988)  for  obtaining  Gabor  coefficients  using 
biorthogonal  functions,  but  these  techniques  are  computationally  intensive.  A  fest  algorithm  for 
computing  Gabor  coefficients  has  recently  been  developed  (Gertner  &  Zeevi,  1990)  based  on  the 
ZT.  However,  one  property  (Janssen,  1982)  of  the  ZT  is  that  when  it  is  applied  to  a  finite-energy, 
continuous  function,  the  result  has  a  zero  in  the  unit  square.  This  zero  leads  to  instabilities  in  the 
calculation  of  expansion  coefficients  (see  Introduction). 

Shown  in  Figure  7a  is  a  window  function,  g(r),  obtained  by  taking  the  inverse  ZT  of  the 
function  Zg(r^s).  The  magnitude  of  the  complex-valued  fiinction  Zg(r,s)  is  shown  in  Figure  7c, 

The  function,  Zg(r,s),  does  not  have  a  zero  as  is  evident  from  the  right  side  of  Equation  26~the 
functions  forming  the  denominator  each  have  only  one  zero  and  each  is  in  a  different  place.  A 

graphical  representation  of  the  denominator  of  Equation  26  is  shown  in  Figure  6c.  The  function, 
Zg(r,s),  is  obtained  when  the  denominator  of  Equation  26  is  multiplied  by  Zg(r,s).  The 

denominator  in  Equation  26  is,  in  effect,  a  weighting  function  to  Zg(r,s),  which  removes  the  zero 

from  the  latter  and  which  produces  a  symmetrical  biphasic  wiudow  function.  The  expansion 
coefficients  of  Equations  28  and  29  can,  therefore,  be  computed  and  the  confutation  is 
numerically  stable.  Further,  in  order  to  confute  the  expansion  coefficients  of  a  two-dimensional 
signal  it  is  necessary  to  perform  a  four-dimensional  FT.  Equations  28  and  29  are  of  a  form  to 
which  an  FFT  can  be  apphed,  thus  significantly  reducing  the  computational  effort  required  to 
obtain  the  expansion  coefficients. 

An  ideal  filter  is  one  which  uniformly  passes  some  frequencies  and  conf  letely  rejects  all 
others  (Oppenheim  &  Willsky,  1983).  The  time-domain  representation  of  an  ideal  filter  is  a 

sinc-function  that  typically  displays  significant  side-lobes  or  ringing.  As  shown  in  Figure  7b,  the 
FT  of  the  window  function,  g(r),  approximates  an  ideal  filter  in  that  it  has  a  relatively  well-defined 

pass-band.  The  price  paid  for  this  property,  however,  is  some  oscillatory  behavior  in  the  time 
domain  (see  Figure  7a).  Exanf  les  of  a  deconf  osition,  with  respect  to  Zg(r,s),  of  both  a  spatially 

narrow  and  a  spatially  wide  one-dimensional  (time)  signal,  are  shown  in  Figures  8  and  9, 
respectively.  As  is  evident  from  these  figures,  the  new  window  function  can  be  used  to  obtain  an 
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A)  The  new  waveform. 

Magnitude 


B)  The  Fourier  transform  of  new  waveform. 

Magnitude 


Figure  7 

The  New  Waveform  and  Both  Its  Zak  and  Fourier  Transforms 
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A)  Input  signal. 

Magnitude 


1 

Time 

- — 

512 

B)  Fourier  transform  of  input  signal. 
Magnitude 


100 


- Frequency 

412 


C)  Expansion  coefficients  for  new  waveform. 


Figure  8 

Decomposition  of  a  Narrow  Signal  Using  the  New  Waveform 
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A)  Input  Signal. 

Magnitude 


B)  Fourier  transform  of  input  signal. 


Magnitude 


C)  Expansion  coefficients  for  new  waveform. 


Figure  9 

Decomposition  of  a  Wide  Signal  Using  the  New  Waveform 


appropriate  joint  time/frequency  representation  that  is  consistent  with  the  scaling  property  of  the 
FT. 

It  should  be  noted  that  the  basis  functions  generated  using  the  new  waveform,  g(r,  ^),  as  a 
window  function  are  all  of  the  same  size.  Although  it  may  be  more  biologically  relevant  to  use  a 

pyramidally-scaled  Gaborian  basis  (Daugman,  1988;  Porat  &  Zeevi,  1988),  computationally  fast 
and  stable  techniques  for  performing  such  a  decomposition  have  not  to  our  knowledge  been 
developed.  While  it  is  generally  accepted  that  bases  used  in  visual  research  should  be  spatially 
localized  (Koenderink  &  van  Doom,  1990;  Stork  &  Wilson,  1990),  there  are  few  theoretical 
criteria  for  choosing  among  the  alternatives.  One  advantage,  in  this  regard,  of  the  window 
function  derived  here,  is  that  it  resembles  a  visual  receptive  field.  Further,  this  window  function 
can  be  easily  scaled  to  give  a  basis  suitable  for  multiresolution  analysis,  and  it  provides  a 
mathematically  rigorous,  stable  algorithm  for  computing  expansion  coefiBcients. 

V.  IMAGE  REPRESENTATION  USING  A 
LOCALIZED  COSINE  TRANSFORM 


Introduction 

Like  the  Discrete  Fourier  Transform  (DFT),  the  Discrete  Cosine  Transform  (DCT)  gives  a 
spectral  representation  of  an  image.  The  global  (spatial-fi-equency  only)  character  of  both  the 
DFT  and  the  DCT  is  not  suitable  for  representing  images  whose  spatial  detail  is  not  distributed 
homogeneously.  This  is  the  case  for  most  natural  images  and  thus  there  has  recently  been  great 
interest  in  techniques  that  provide  a  combined  representation  in  both  position  and  spatial 
frequency  (Jacobson  &  Wechsler,  1988;  Zeevi  &  Gertner,  1992).  One  such  combined  technique 
is  based  on  Gabor  functions  and  represents  images  using  complex-valued,  Fourier-based 
transforms. 

The  purpose  of  the  present  study  is  to  investigate  the  applicability  of  the  DCT  to  the 
position/spatial  frequency  representation  of  images.  For  this  purpose,  a  real.  Discrete  Zak-Cosine 
Transform  (DZCT)  has  been  developed  and  certain  of  its  properties  have  been  investigated.  The 
DZCT  provides  a  combined  position/spatial  fi-equency  representation  using  only  real  variables 
(i.e.,  the  local  spectral  characteristics  of  an  image  can  be  represented  as  an  array  of  real  numbers) 
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and  hence  is  more  eflBcient  than  Fourier-based  approaches.  The  DZCT  also  provides  a  new 
perspective  for  localized  image  compression,  in  that  it  provides  a  local  DCT  whose  spectral 
properties  may  be  varied  at  each  location  in  the  image  in  order  to  match  image  characteristics.  In 
addition,  the  DZCT  has  an  inverse  transform  that  reproduces  the  original  signal,  and  thus  can  be 
used  for  image  filtering  and  compression  in  coefiScient  space.  Finally,  since  the  DZCT  is  position 
sensitive,  we  can  design  filters  that  match  the  local  properties  of  the  image.  The  major  benefit  is 
that  aU  computa^tions  are  real,  thus  saving  both  memory  and  computational  time. 


The  Discrete  Zak-Cosine  Transform 

One  form  (Rao  &  Yip,  1990)  of  the  DCT  is: 


with  inverse 


T-  + 1) 

Fk=Zjfj-^os  ^ 

7=0 


(30) 


N-1 


COS 


fc=l 


%k(2J+l) 

IN 


(31) 


This  form  is  obtained  by  extending  the  given  data  fi-om  j  =  0,  ....  N-I,  to  j  =  N,  ....  2N-1 
symmetrically  around  the  point  N-1/2  (Press,  Teukolsky,  Vetterling,  &  Flannery,  1993).  This 
extension  produces  a  sequence  that  is  even  about)  =  -1/2. 

It  has  been  shown  (Zeevi  &  Gertner,  1992)  that  the  Zak  transform  (ZT)  can  be  used  to 
adequately  represent  images  in  the  combined  position/spatial-frequency  domain.  Analogous  to 
the  manner  in  which  Zeevi  and  Gertner  computed  the  ZT  using  the  DFT,  we  will  define  the  ZT 
using  the  DCT  as  follows: 


/=0 


IN 


(32) 
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where  r  and  s  are  spatial  frequency  and  position  variables,  respectively.  Thus,  Equation  32  is  the 
definition  of  the  one-dimensional  DZCT  of  the  signal/ 


Properties  Of  The  Discrete  Zak-Cosine  Transform 

The  DZCT  is  a  real  transform  that  has  many  useful  properties.  For  instance,  while  the 
DZCT  is  defined  on  the  unit  square  only,  it  can  be  extended  beyond  the  unit  square  using  the 
following,  easily  verified,  periodicity  properties: 

Z^r+l,s)  =  ZJ{r,s)  ,  (33) 

and 

ZSr,!:+\)^cos^^Q^-ZJ{r,s)  .  (34) 

Equations  33  and  34  indicate  that  the  DZCT  is  periodic  with  period  one  m  the  r-variable,  and  is 
quasi-periodic  with  period  one,  in  the  5-variable. 

In  order  to  develop  a  combined  position/spatial-frequency  representation  using  a  cosine 
basis,  we  will  follow  the  development  described  in  Zeevi  and  Gertner  (1992).  We  will  show 
below  that  the  DCT  can  be  used  in  place  of  the  DFT  in  order  to  obtain  the  combined 
representation.  To  this  end,  we  begin  by  defining  a  basis  as: 

gmnis)  =  cos  . g(s  -  nqo)  ,  (35) 


where  is  a  window  function,  nq^  is  a  positional  shift  of  the  window,  and  mpg  is  the  spatial 
frequency.  We  can  then  con:q)ute  the  DZCT  oig^(s)  as: 


iZcg  mn 


JV-l 

I 

/=0 


cos 


%l(2r  + 1) 
IN 


■gmr,{s+N-l) 


(36) 
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Performing  some  relevant  algebraic  manipulations  (Zeevi  &  Grertner,  1992)  on  Equation  36  gives: 


{Zcgm,i){r,s)  =  {Zcg)(r,s)  ■  cos 


Tzmpo(^r+V)  %nqo(^  + 1) 


IN 


cos- 


2N 


(37) 


Thus,  we  can  expand  (Gertner  &  Geri,  in  press;  Zeevi  &  Gertner,  1992)  the  signal  /with  respect 
to  the  basis  shown  as  Equation  35  as: 

fis)  =  S  •  gm„{s)  .  (38) 

mn 


Then,  by  taking  the  DZCT  of  both  sides  and  using  the  property  shown  as  Equation  37,  we  get: 

(Z/)(r,5) 

—  ^  ■  (Zcgmn}(f' 5 -S') 

mn 


N/  aV  nmpo(2r+l)  Tcnqo(2s+l) 

=  (Zcg)(r,  s)2j  amn  •  cos  — ^  •  cos  -  ^  ^ 


2N 


2N 


(39) 


The  coefficients  ,  which  are  analogous  to  Gabor  coefficients  (Gabor,  1946),  are  corqputed  by 
taking  the  inverse  2-D  DCT  of  the  ratio  Zf/Z^. 


Discussion 

The  DCT  has  been  used  in  a  wide  range  of  digital  signal  processing  applications  including 
filtering,  coding,  conqjression,  and  classification  (Rao  &  Yip,  1990).  For  instance,  it  has  become 
a  standard  tool  in  the  compression  of  both  still  and  moving  images  (see,  Pennebaker  &  Mitchell, 
1993,  and  Appendix  2),  and  special  hardware  has  been  developed  to  inqjlement  it  (Rao  &  Yip, 
1990). 

Unlike  the  DFT,  the  DCT  involves  only  real  numbers  and  thus  requires  less  memory  and 
less  computational  time  to  inq)lement.  Other  techniques  that  have  been  suggested  for 
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reconstructing  images  from  real  coefficients  only  (Oppenheim  &  Lun,  1981;  Behar,  Porat  & 
Zeevi,  1992),  do  so  using  partial  information.  By  contrast,  the  technique  described  in  this  section 
allows  error-free  image  reconstruction. 
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Vn.  APPENDICES 


Appendix  1; 

Source  code  for  the  Zak-Hermite  decomposition  program,  which  consists  of  the  following 
modules: 

1)  image.c  —  the  main  module  which  calls  other  modules,  defines  menus  and  associated 
display  windows,  and  defines  all  constants  for  diplay  colors,  the  keyboard,  window 
memory  management,  and  the  cursor, 

2)  herm.c  -  module  that  performs  the  Zak  transform  using  a  Hermite-fimction  window. 
The  routine ybumc  (Press  et  ai,  1993)  is  used  to  perform  the  EFT. 

3)  dispjmax  -  Module  that  diplays  a  bitmapped,  binary  image  in  low  resolution  (320  x 
200  X  8  bits).  Image  may  be  chpped  depending  on  its  size. 

4)  cmpjima.c  —  Module  that  produces  graphics  diplay  using  Trident  graphics  diplay 
card  (or  any  conpatible).  Four  graphics  windows  are  placed  on  the  screen.  Total 
diplay  resolution  is  640  x  480  x  8  bits,  so  four  256  x  256  images  can  be  diplayed. 

5)  5x8. inc  —  Module  that  defines  a  bitmap  for  each  a^hanumeric  character  needed  to 
produce  SuperVGA-resolution  graphics  using  the  Trident  card.  This  module  is 
required  because  the  NDP  conpiler  does  not  support  the  Trident  card  in  SuperVGA 
resolution. 

6)  prntjma.c  —  Module  used  to  ouput  a  256  x  256  image  to  a  HP  Laseijet  (or 
compatible)  printer. 

7)  utile  —  Module  library  for  defining  operations  with  complex  numbers,  and  memory 
allocation  for  image  and  coefiheient  arrays. 
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276:  char  naine1[13]; 
277:  char  naii)e2M3]; 
278:  € 
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53:  v=(double  *)n)alloc((unsi9ned)  (nh-nUD^sizeof (double)) 

54:  if  (Iv)  nrerrorC'al location  failure  in  dvectorO”); 

55:  return  v-nl; 
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Appendix  2. 

Description  of  the  Discrete  Cosine  Transform  (DCT) 
as  Implemented  by  the  Joint  Photographic  Ejq)erts  Group  (JPEG) 
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The  Discrete  Cosine  Transform  (DCT):  Definitions  and  Rationale 

Con^ression  usmg  the  DCT  is  carried  out  in  three  stages:  1)  DCT  transform,  2) 
coefficient  quantization,  and  3)  lossless  coding.  The  two-dimensional  DCT  (cf.,  Pennebaker  & 
Mitchell,  1993;  Rao  &  Yip,  1990)  is  defined  as  follows: 


DCT{i,j)  = 


1 

JlN 


N-\  N-l 


c(i)  ■  c(j)  E  E  •  cos 

jp=0  y=0 


(2x  +  \)i% 

■  cos 

'(2y  +  iyn 

2N  i 

IN 

(A) 


where  C(x)  =  1/^2  when  x=0  and  C(x)  =  1,  otherwise,  for  an  image  of  rize  NxN  pixels.  The 
DCT  transform  yields  an  NxN  (i.e.,  square)  matrix  of  frequency  coefficients. 

The  inverse  DCT  is  defined  similarly  as: 


E  E  C(i)  •  C(j)  •  DCT(i,J)  •  I(x,y) .  cos 


.  1 ,1 

JlN  x=0  y=0 


(2x  +  l)m 
2N 


•cos 


IN 


■  (B) 


After  taking  the  DCT  of  an  image,  aU  the  elements  in  row  0  have  a  zero  frequency 
component  in  one  direction,  and  all  the  elements  in  column  0  have  a  frequency  conq)onent  of  zero 
in  the  other  direction.  As  the  zeros  and  columns  move  away  fi-om  the  origin,  the  coefficients  in 
the  transformed  DCT  matrix  begin  to  represent  higher  fi-equencies,  with  the  highest  fi-equencies  at 
position  N-l  of  the  matrix.  This  fact  is  significant  in  image  compression.  For  most  images,  the 
jfrequency  components  in  row  0  and  column  0  (the  DC  components)  are  relatively  large.  As  we 
move  away  from  the  DC  components  toward  the  higher  fi-equencies,  we  find  that  the  coefficients 
not  only  tend  to  have  lower  values,  but  they  become  less  inq)ortant  perceptually  for  image 
description.  So  the  DCT  can  help  to  identify  pieces  of  information  in  the  image  that  can  be 
effectively  removed  without  seriously  conq)romising  image  quality.  The  quantitative  description 
of  image  quahty  is  an  important  issue.  So  far,  there  is  no  single,  good  measure  for  specifying  the 
perceptual  quahty  of  an  image.  The  procedure  for  deciding  how  to  discard  the  relatively 
insignificant  image  information  (ie.,  coefficients)  will  be  described  later. 
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Before  taking  the  DCT,  the  image  is  broken  down  into  small  blocks  of  size  8x8  or  16x16 
pixels,  and  the  DCT  is  taken  on  these  smaller  blocks.  The  upper  left  comer  m  each  block 
represents  the  DC  term  for  that  block.  The  lower  right  comer  of  the  block  represents  the  highest 
frequency  for  the  block.  The  block  size  is  a  parameter,  however,  the  conventional  practice  is  to 
use  an  8x8  block. 


Quantization 

The  next  step  in  image  compression  is  quantization.  The  original  image  is  represented  by 
8  bits  per  pixel  resulting  m  256  different  gray  scale  levels.  After  the  DCT,  each  frequency  point 
takes  on  values  in  the  range  from  -1,024  to  +1,023,  and  thus  occupies  11  bits.  Quantization  is  a 
procedure  that  reduces  the  number  of  bits  required  to  represent  each  coefficient.  When  we 
reduce  the  number  of  bits  needed  to  store  the  coefficients,  we  also  reduce  precision.  Since  the 
coefficients  far  from  the  low  frequency  components  contribute  little  to  perceptual  image  quahty, 
the  precision  of  these  coefficients  can  be  low  (ie.,  they  can  be  represented  with  a  small  number  of 
bits).  The  farther  away  from  the  DC-pomt  of  the  image,  the  less  a  given  element  contributes  to 
image  quality,  and  the  less  we  are  concerned  with  its  precision. 


Quantization  is  inqilemented  using  a  quantization  matrix.  For  every  element  position  in 
the  DCT  matrix,  a  corresponding  value  in  the  quantization  matrix  gives  a  quantization  value 
step-size.  This  step-size  ranges  from  1  to  255.  The  most  important  elements  m  the  image  will  be 
encoded  with  a  small  step  size,  with  size  1  givmg  the  most  precision.  The  less  important  values, 
corresponding  to  high  frequencies,  will  be  encoded  with  a  large  step  size.  The  quantization 
formula  is  as  follows: 


Quantized j^alueiiJ)  =  RoundJojiearestJnteger 


DCTiiJ) 


(C) 


[  step_size(ij) . 

where,  stepjsize  (i,j)  is  obtamed  from  the  quantization  matrix.  From  the  above  formula,  it  is  clear 
that  for  large  step-size  values,  small  DCT  (ij)  values  will  be  quantized  to  zero.  This  is  the  case, 
for  instance,  for  most  high  frequency  con^ionents.  The  smaller  the  step  size,  the  higher  is  the 
precision  of  the  quantization  .  Thus,  for  example,  a  high  value  of  DCT  (iJ)  with  a  small  step-size 
will  have  a  large  quantized  value.  A  small  value  of  DCT  (i,j)  with  a  large  step-size  will  be 
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quantized  to  zero.  Thus,  only  if  high  frequency  coefficients  reach  unusually  large  values,  will  they 
be  quantized  to  non-zero  values. 

The  dequantization  formula  operates  in  reverse  order: 

DCTijJ)  =  QuantizedjyalueiiJ)  •  step_size(j,j)  .  (D) 

Obviously,  large  step-sizes  can  generate  large  errors. 


Selection  of  a  Quantization  Matrix 

The  selection  of  an  appropriate  quantization  matrix  is  the  most  m^ortant  step  in 
determining  the  quality  of  a  decompressed  image.  Although  there  is  no  rigorous  mathematical 
theory  by  which  to  choose  the  quantization  matrix,  there  are  many  experimental  approaches.  One 
of  the  approaches  is  to  measure  the  error  between  the  origmal  image  and  the  deconq)ressed  image 
for  each  quantization  matrix,  and  then  choose  the  quantization  matrix  that  gives  the  minimal  error. 
Another  approach  is  to  evaluate  the  deconqrressed  unage  using  perceptual  criteria.  Of  course,  the 
perceptual  criterion  will  not  always  correspond  exactly  to  the  minimal  error  criterion.  Because 
the  quantization  matrix  determines  the  decortqjressed  image  quality,  choosing  extraordinarily  large 
step-sizes  for  the  DCT  coefficients  results  m  excellent  conq)ression  ratios  but  poor  knage  quality. 
By  choosing  small  step-sizes,  we  get  low  conq)ression  ratio  but  very  good  deconq)ressed  image 
quahty.  This  tradeoff  allows  a  great  deal  of  flexibility  in  choosing  the  required  picture  quality 
based  on  imaging  requirements  and  available  storage. 

The  quantization  matrix  desired  is  computed  as  follows: 

Stepjize  [/]  [/]  =  1  +  [( 1  +  (/  +J))  •  quality  -factor]  (E) 

First,  the  user  selects  a  quality  factor  from  1  to  25.  Quality  factor  1  corresponds  to  best  quality, 
while  25  corresponds  to  worst  quahty.  Thus,  using  Equation  E,  the  quantization  matrix  for 
quahty  2  would  be: 
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5  7  9  11  (Ml) 

7  9  11  13 

9  11  13  15 


As  an  exan5)le,  if  the  DCT  matrix  before  quantization  were: 


92 

3 

-9 

-7 

1 

00 

62 

1 

-18 

(M2) 

-52 

-36 

-10 

14 

-39 

-58 

12 

17  , 

after  dequantization  using  Ml,  we  would  have: 


90  0-7  0 

-35  -56  9  11  (M3) 

-84  54  0  -13 

-45  -33  0  0 

The  difference  matrix  is: 

23  -2-7 

-4-2  3  6 

0  6  1-6 

-7  -3  -10  14 
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which  are  rather  small  entries  for  a  quality  factor  of  2.  Thus,  we  would  conclude  that  the 
quantization  error  affects  the  image  quality. 

Coding 

The  final  step  in  corcqpression  is  coding  the  quantized  images.  Recall  that  the  coefficient 
block  (0,0)  represents  the  DC-value  (i.e.,  the  mean  intensity)  of  the  corresponding  image  block. 
Usually,  the  adjacent  image  block  has  a  similar  mean  intensity  value,  and,  therefore,  for  encoding 
purpose,  the  absolute  value  of  the  (0,0)  coefficient-block  is  replaced  by  a  relative  value.  Since 
adjacent  blocks  in  an  image  are  usually  highly  correlated,  coding  a  given  DC-value  as  the 
difference  from  the  previous  DC-value  typically  produces  a  very  small  number.  Coefficients  are 
encoded  using  run-length  encoding  of  zero-value.  Since  a  large  number  of  coefficients  are 
truncated  to  zero,  this  encoding  scheme  is  very  efficient.  Finally,  a  zig-zag  reordering  of 
coefficients  is  performed  to  increase  the  run  length  of  zero  coefficients  (see  e.g.,  Pennebaker  & 
Mitchell,  1993). 
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